Polycystic ovary syndrome (PCOS) is a complex condition characterized by elevated androgen levels, menstrual irregularities, and/or small cysts on one or both ovaries.
The disorder can be morphological (polycystic ovaries) or predominantly biochemical (hyperandrogenemia).
Hyperandrogenism, a clinical hallmark of PCOS, can cause inhibition of follicular development, microcysts in the ovaries, anovulation, and menstrual changes.
pdata$Oligomenorrhea <- as.factor(pdata$Oligomenorrhea)
pdata$Polycystic.Ovarian.Morphology <- as.factor(pdata$Polycystic.Ovarian.Morphology)
pdata$Hyperandrogenism <- as.factor(pdata$Hyperandrogenism)
pdata$Category <- as.factor(pdata$Category)
pdata <- mutate(pdata, Weight.Category = ifelse((BMI > 25), "Overweight", "Healthy/Low"))
pdata$Weight.Category <- as.factor(pdata$Weight.Category)
head(pdata)
## Specimen.id. Age BMI Hirsutism Testosterone.ng.mL Oligomenorrhea
## 1 1 34 24.8 8 0.26 No Oligo
## 2 2 23 27.4 9 0.60 No Oligo
## 3 3 26 44.0 4 0.32 Oligomenorrhea
## 4 4 32 29.2 10 0.85 Oligomenorrhea
## 5 5 20 25.8 8 0.49 Oligomenorrhea
## 6 7 26 33.4 6 0.73 Oligomenorrhea
## Hyperandrogenism Polycystic.Ovarian.Morphology Category Weight.Category
## 1 Hyperandrogenism PCO ovaries HA+PCOM Healthy/Low
## 2 Hyperandrogenism PCO ovaries HA+PCOM Overweight
## 3 No Hyperandrogenism PCO ovaries Oligo+PCOM Overweight
## 4 Hyperandrogenism PCO ovaries PCOS Overweight
## 5 Hyperandrogenism No PCO ovaries PCOS Overweight
## 6 Hyperandrogenism PCO ovaries PCOS Overweight
names(pdata) #verify all variables are present
## [1] "Specimen.id." "Age"
## [3] "BMI" "Hirsutism"
## [5] "Testosterone.ng.mL" "Oligomenorrhea"
## [7] "Hyperandrogenism" "Polycystic.Ovarian.Morphology"
## [9] "Category" "Weight.Category"
Increasing testosterone and age are more likely to result in an elevated BMI.
ggplot(
pdata,
aes(
x=Testosterone.ng.mL,
y=BMI
)
) +
geom_point(
size=1) +
theme_minimal() +
geom_smooth(
formula = 'y ~ x',
method = 'lm',
se=FALSE) +
labs(
x="Testosterone (ng/mL)",
y="BMI",
title="Testosterone Levels v. BMI"
)
ggsave("Testosterone-v-BMI.png", height=4.5, width = 4.5, units = "in")
ggplot(
pdata,
aes(
x=Age,
y=BMI,
)
) +
geom_point(
size=1) +
theme_minimal() +
geom_smooth(
formula = 'y ~ x',
method = 'lm',
se=FALSE) +
labs(
x="Age",
y="BMI",
title="Participant Age v. BMI"
)
ggsave("Age-v-BMI.png", height=4.5, width = 4.5, units = "in")
ggplot(
pdata,
aes(
x=Testosterone.ng.mL,
y=BMI,
color=Age
)
) +
scale_colour_gradient(
low = "red",
high = "green",
space = "Lab",
na.value = "grey50",
guide = "colourbar",
aesthetics = "colour"
) +
geom_point(
size=1,
) +
theme_gray() +
geom_smooth(
formula = 'y ~ x',
method = 'lm',
se=FALSE) +
labs(
x="Testosterone (ng/mL)",
y="BMI",
title="Testosterone Levels v. BMI",
subtitle="colored by age"
)
## Warning: The following aesthetics were dropped during statistical transformation: colour
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
ggsave("Testosterone+Age-v-BMI.png", height=4.5, width = 6, units = "in")
## Warning: The following aesthetics were dropped during statistical transformation: colour
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
Comment about the plots. Statement justifying no interaction effects between the two IVs. Modifying the hypothesis.
m1 = glm(BMI ~ Testosterone.ng.mL + Age, data=pdata)
summary(m1)
##
## Call:
## glm(formula = BMI ~ Testosterone.ng.mL + Age, data = pdata)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -7.8367 -3.8653 -0.8624 2.1259 20.9440
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.58869 2.70096 5.401 2.18e-07 ***
## Testosterone.ng.mL 5.06079 2.07240 2.442 0.01562 *
## Age 0.26338 0.08129 3.240 0.00144 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 28.36949)
##
## Null deviance: 5265.0 on 174 degrees of freedom
## Residual deviance: 4879.6 on 172 degrees of freedom
## AIC: 1087
##
## Number of Fisher Scoring iterations: 2
autoplot(m1)
Statement justifying changing model to GLM, possion (AIC, deviance are below the Df)
##
## Call:
## glm(formula = BMI ~ Testosterone.ng.mL + Age, family = "poisson",
## data = pdata)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6277 -0.8110 -0.1694 0.4332 3.8700
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.793357 0.102725 27.193 < 2e-16 ***
## Testosterone.ng.mL 0.204905 0.077715 2.637 0.008374 **
## Age 0.010736 0.003071 3.497 0.000471 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 202.67 on 174 degrees of freedom
## Residual deviance: 186.92 on 172 degrees of freedom
## AIC: Inf
##
## Number of Fisher Scoring iterations: 4
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## No trace type specified:
## Based on info supplied, a 'scatter3d' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#scatter3d
## No scatter3d mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode